Skip to content

Fix BigInteger char parsing with Ukrainian culture NBSP handling (bidirectional)#123783

Merged
artl93 merged 16 commits intomainfrom
copilot/fix-parse-test-uk-ua
Feb 12, 2026
Merged

Fix BigInteger char parsing with Ukrainian culture NBSP handling (bidirectional)#123783
artl93 merged 16 commits intomainfrom
copilot/fix-parse-test-uk-ua

Conversation

Copy link
Contributor

Copilot AI commented Jan 30, 2026

Fix for BigInteger char parsing with Ukrainian culture NBSP handling (bidirectional)

Fixes #120283

Issue

BigInteger parsing fails with cultures that use NBSP (U+00A0) as NumberGroupSeparator (like Ukrainian uk-UA) when users type regular space (U+0020).

Solution

Modified the MatchChars function in Number.Parsing.Common.cs to handle bidirectional space-replacing character equivalence:

  • Added NormalizeSpaceReplacingChar helper that converts NBSP/narrow NBSP to regular space
  • Simplified comparison: NormalizeSpaceReplacingChar(cp) != NormalizeSpaceReplacingChar(val)
  • This handles both NBSP→space and space→NBSP equivalence for char (UTF-16) parsing
  • Much more readable than the previous complex boolean expression
  • Updated comments to reference Ukrainian culture alongside French and Kazakh

Testing

  • Added ParseWithNBSPAsGroupSeparator() test in existing parse.cs file
  • Test creates a custom culture (en-US base) with NBSP explicitly set as NumberGroupSeparator
  • Test validates that regular spaces ("1 234 567") are accepted when culture expects NBSP
  • Also tests trailing space handling
  • This approach is portable and doesn't depend on Ukrainian culture data which may vary across systems/ICU versions
  • Tests validate the bidirectional space equivalence fix for char (UTF-16) parsing ✓

Impact

This fix only impacts BigInteger parsing. Other numeric types (Complex, double, float, decimal) use different parsing infrastructure in System.Private.CoreLib that doesn't include the common parsing file.

Known Limitations

This fix addresses char (UTF-16) parsing only. UTF-8 parsing issues are tracked separately in #124016.

Progress

  • Implement bidirectional space equivalence for char (UTF-16)
  • Add portable tests using custom culture in existing parse.cs file
  • Simplify design based on code review feedback
  • Optimize condition ordering for performance
  • Refactor to helper function for better readability
  • Make tests portable across different ICU versions
  • Fix test to actually include spaces to test space equivalence

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI and others added 4 commits January 30, 2026 02:00
Co-authored-by: artl93 <9855795+artl93@users.noreply.github.com>
Co-authored-by: artl93 <9855795+artl93@users.noreply.github.com>
Co-authored-by: artl93 <9855795+artl93@users.noreply.github.com>
Co-authored-by: artl93 <9855795+artl93@users.noreply.github.com>
@artl93
Copy link
Member

artl93 commented Jan 31, 2026

@copilot

Consider your design. Why is it correct and the simplest form of the solution? Are there other possibilities? I believe you consisdered the following. Why did you pick something different?

        private static unsafe TChar* MatchChars<TChar>(TChar* p, TChar* pEnd, ReadOnlySpan<TChar> value)
            where TChar : unmanaged, IUtfChar<TChar>
        {
            Debug.Assert((p != null) && (pEnd != null) && (p <= pEnd));

            fixed (TChar* stringPointer = &MemoryMarshal.GetReference(value))
            {
                TChar* str = stringPointer;

                if (TChar.CastToUInt32(*str) != '\0')
                {
                    // We only hurt the failure case
                    // This fix is for French, Kazakh, and Ukrainian cultures. Since a user cannot type 0xA0 or 0x202F as a
                    // space character we use 0x20 space character instead to mean the same.
                    // We also need to handle the reverse case where the input has 0xA0 or 0x202F and the format string has 0x20.
                    while (true)
                    {
                        uint cp = (p < pEnd) ? TChar.CastToUInt32(*p) : '\0';
                        uint val = TChar.CastToUInt32(*str);

                        if ((cp != val) && !((IsSpaceReplacingChar(val) && (cp == '\u0020')) || (IsSpaceReplacingChar(cp) && (val == '\u0020'))))
                        {
                            break;
                        }

                        p++;
                        str++;

                        if (TChar.CastToUInt32(*str) == '\0')
                        {
                            return p;
                        }
                    }
                }
            }

            return null;
        }
    }
    ```

Copy link
Contributor Author

Copilot AI commented Jan 31, 2026

@copilot

Consider your design. Why is it correct and the simplest form of the solution? Are there other possibilities? I believe you consisdered the following. Why did you pick something different...

Thank you for the question! I tested your simpler solution (commit 9f5c3ae5) and you're right that it's more elegant. However, both approaches have the same limitation:

The core issue: IsSpaceReplacingChar checks Unicode codepoints (0xA0, 0x202F), but UTF-8 encodes NBSP as TWO bytes (0xC2 0xA0). When TChar is byte:

  • Pattern contains [0xC2, 0xA0] (UTF-8 NBSP)
  • Input contains [0x20] (space)
  • First byte comparison: IsSpaceReplacingChar(0xC2) returns false (0xC2 ≠ 0xA0)
  • Match fails

My complex UTF-8 handling attempted to fix this but introduced bugs with pointer advancement. Your simpler bidirectional check works perfectly for char (UTF-16) but not for UTF-8, which is why the original outer loop test still fails with both approaches.

The simpler solution is correct and I've adopted it (removes ~70 lines of buggy code). The remaining UTF-8 issue needs a different approach - likely at the point where NumberGroupSeparatorTChar<TChar>() returns the separator, it should handle the space equivalence transformation rather than in MatchChars.

…-8 handling)

Co-authored-by: artl93 <9855795+artl93@users.noreply.github.com>
Copilot AI changed the title Fix BigInteger UTF-8 parsing with Ukrainian culture NBSP handling Fix BigInteger char parsing with Ukrainian culture NBSP handling (bidirectional) Jan 31, 2026
Copy link
Member

@artl93 artl93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot - a simple test should be made to the parse test file - not its own file. Please use existing tests instead of adding them in their own.

Co-authored-by: artl93 <9855795+artl93@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Jan 31, 2026

@copilot - a simple test should be made to the parse test file - not its own file. Please use existing tests instead of adding them in their own.

Done! I've moved the two Ukrainian culture tests into the existing parse.cs file (as [Fact] tests at the end of the parseTest class) and removed the separate parse.ukUA.cs file. Commit ee8019f.

@artl93
Copy link
Member

artl93 commented Feb 10, 2026

/azp run runtime-libraries-coreclr outerloop

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@tarekgh
Copy link
Member

tarekgh commented Feb 10, 2026

@copilot is this related failure to the change here?

{ "HelixJobId": "bd8a8479-7771-47b4-bfb6-9b29c8662936", "HelixWorkItemName": "System.Runtime.Numerics.Tests" }

Error message
System.FormatException : The value could not be parsed.

Stack trace
   at System.Number.ThrowOverflowOrFormatException(ParsingStatus status) in /_/src/libraries/System.Runtime.Numerics/src/System/Number.BigInteger.cs:line 29
   at System.Number.ParseBigInteger[TChar](ReadOnlySpan`1 value, NumberStyles style, NumberFormatInfo info) in /_/src/libraries/System.Runtime.Numerics/src/System/Number.BigInteger.cs:line 136
   at System.Numerics.BigInteger.Parse(ReadOnlySpan`1 utf8Text, NumberStyles style, IFormatProvider provider) in /_/src/libraries/System.Runtime.Numerics/src/System/Numerics/BigInteger.cs:line 707
   at System.Numerics.Tests.parseTest.VerifyParseSpanToString(String num1, NumberStyles ns, Boolean failureNotExpected, String expected) in /_/src/libraries/System.Runtime.Numerics/tests/BigInteger/parse.cs:line 828
   at System.Numerics.Tests.parseTest.VerifyParseToString(String num1, NumberStyles ns, Boolean failureNotExpected, String expected) in /_/src/libraries/System.Runtime.Numerics/tests/BigInteger/parse.cs:line 884
   at System.Numerics.Tests.parseTest.VerifyParseToString(String num1, NumberStyles ns, Boolean failureNotExpected) in /_/src/libraries/System.Runtime.Numerics/tests/BigInteger/parse.cs:line 818
   at System.Numerics.Tests.parseTest.VerifyNumberStyles(NumberStyles ns, Random random) in /_/src/libraries/System.Runtime.Numerics/tests/BigInteger/parse.cs:line 689
   at System.Numerics.Tests.parseTest.RunParseToStringTests(CultureInfo culture) in /_/src/libraries/System.Runtime.Numerics/tests/BigInteger/parse.cs:line 52
   at InvokeStub_parseTest.RunParseToStringTests(Object, Span`1)
   at System.Reflection.MethodBaseInvoker.InvokeWithOneArg(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture) in /_/src/libraries/System.Private.CoreLib/src/System/Reflection/MethodBaseInvoker.cs:line 95

Copy link
Contributor Author

Copilot AI commented Feb 10, 2026

@copilot is this related failure to the change here?

...

This is the original issue that the PR partially addresses. The failure is in UTF-8 parsing (line 828: BigInteger.Parse(utf8Num1, ns)), which is the known limitation documented in the PR description. My fix only addresses char (UTF-16) parsing - the bidirectional equivalence check works for char but not for UTF-8 because IsSpaceReplacingChar operates on Unicode codepoints (0xA0) while UTF-8 encodes NBSP as two bytes (0xC2 0xA0). UTF-8 parsing is tracked in #124016 and requires architectural changes beyond this PR's scope.

@artl93
Copy link
Member

artl93 commented Feb 10, 2026 via email

Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
@stephentoub
Copy link
Member

🤖 Copilot Code Review — PR #123783

Holistic Assessment

Motivation: The PR addresses a real test failure with Ukrainian culture (uk-UA) parsing in BigInteger where the test generates numbers with regular spaces but the culture's NumberGroupSeparator is NBSP (U+00A0). The linked issue #120283 confirms this is a genuine problem blocking CI.

Approach: The fix normalizes both input characters and format characters to a common space (U+0020) before comparison, making the equivalence bidirectional. This is a reasonable approach that aligns with how DateTimeParse.cs handles the same problem.

Summary: ⚠️ Needs Human Review. The fix is functionally correct and addresses the reported issue, but there is a performance concern with the implementation that warrants maintainer attention before merging.


Detailed Findings

⚠️ Performance Regression in Hot Path — Should Consider Restoring Fast Path

The change removes the implicit fast-path for identical characters that existed in the original code.

Original code:

if ((cp != val) && !(IsSpaceReplacingChar(val) && (cp == '\u0020')))

When cp == val (the common case for most cultures), this short-circuits after 1 comparison.

New code:

if (NormalizeSpaceReplacingChar(cp) != NormalizeSpaceReplacingChar(val))

This unconditionally calls NormalizeSpaceReplacingChar twice, which each call IsSpaceReplacingChar (2 comparisons). So the common success path now costs ~5 operations instead of 1.

The existing comment "We only hurt the failure case" was intentional—the original design preserved fast-path performance. Consider restoring it:

if (cp != val && NormalizeSpaceReplacingChar(cp) != NormalizeSpaceReplacingChar(val))
{
    break;
}

This preserves the exact-match fast path while still handling the bidirectional NBSP/space equivalence correctly.

Flagged by: Gemini, primary reviewer

✅ Correctness — Bidirectional Normalization is Correct

The bidirectional normalization logic is sound:

  • NBSP (U+00A0) in input matches Space (U+0020) in format ✓
  • Space (U+0020) in input matches NBSP (U+00A0) in format ✓
  • Narrow NBSP (U+202F) handled symmetrically ✓

This correctly fixes the reported uk-UA test failures.

✅ Test Coverage — Adequate

The new ParseWithNBSPAsGroupSeparator test covers the key permutations:

  1. Culture with NBSP separator, input with regular spaces
  2. Culture with regular space separator, input with NBSP
  3. Culture with regular space separator, input with narrow NBSP (U+202F)

Using a custom culture based on en-US with explicit NumberGroupSeparator makes the test portable across different ICU versions.

💡 Comment Accuracy — Minor Suggestion

The existing comment "We only hurt the failure case" should be updated if the fast-path optimization is not restored, since it would no longer be accurate. If the fast-path is restored per the suggestion above, the comment remains correct.


Cross-Cutting Analysis

  • Scope verification: The PR correctly identifies that this fix only affects System.Runtime.Numerics (BigInteger, Complex). The shared file Number.Parsing.Common.cs is not used by System.Private.CoreLib's parsing code—I verified CoreLib has its own separate Number.Parsing.cs that doesn't use MatchChars.
  • DateTimeParse comparison: The approach aligns with how DateTimeParse.cs handles the same NBSP issue (line 5722-5723), though that implementation is slightly different (one-directional: only input NBSP matches format space).
  • UTF-8 parsing: The PR correctly notes this is for char (UTF-16) parsing only. If UTF-8 has the same issue, that's tracked separately.

Models contributing to this review: Claude Sonnet 4, Google Gemini 3 Pro

@artl93
Copy link
Member

artl93 commented Feb 11, 2026

/azp run runtime-libraries-coreclr outerloop

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@artl93 artl93 merged commit c667ecc into main Feb 12, 2026
83 of 98 checks passed
@artl93 artl93 deleted the copilot/fix-parse-test-uk-ua branch February 12, 2026 06:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

System.Numerics.Tests.parseTest.RunParseToStringTests(culture: uk-UA) test failures

5 participants